Search CORE

4 research outputs found

Design considerations for workflow management systems use in production genomics research and the clinic

Author: Ahmed Azza E.
Allen Joshua M.
Bhat Tajesvi
Burra Prakruthi
Fadlelmola Faisal M.
Fliege Christina E.
Hart Steven N.
Heldenbrand Jacob R.
Hudson Matthew E.
Istanto Dave Deandre
Kalmbach Michael T.
Kapraun Gregory D.
Kendig Katherine I.
Kendzior Matthew Charles
Klee Eric W.
Mainzer Liudmila S.
Mattson Nate
Ross Christian A.
Sharif Sami M.
Venkatakrishnan Ramshankar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2021
Field of study

Abstract The changing landscape of genomics research and clinical practice has created a need for computational pipelines capable of efficiently orchestrating complex analysis stages while handling large volumes of data across heterogeneous computational environments. Workflow Management Systems (WfMSs) are the software components employed to fill this gap. This work provides an approach and systematic evaluation of key features of popular bioinformatics WfMSs in use today: Nextflow, CWL, and WDL and some of their executors, along with Swift/T, a workflow manager commonly used in high-scale physics applications. We employed two use cases: a variant-calling genomic pipeline and a scalability-testing framework, where both were run locally, on an HPC cluster, and in the cloud. This allowed for evaluation of those four WfMSs in terms of language expressiveness, modularity, scalability, robustness, reproducibility, interoperability, ease of development, along with adoption and usage in research labs and healthcare settings. This article is trying to answer, which WfMS should be chosen for a given bioinformatics application regardless of analysis type?. The choice of a given WfMS is a function of both its intrinsic language and engine features. Within bioinformatics, where analysts are a mix of dry and wet lab scientists, the choice is also governed by collaborations and adoption within large consortia and technical support provided by the WfMS team/community. As the community and its needs continue to evolve along with computational infrastructure, WfMSs will also evolve, especially those with permissive licenses that allow commercial use. In much the same way as the dataflow paradigm and containerization are now well understood to be very useful in bioinformatics applications, we will continue to see innovations of tools and utilities for other purposes, like big data technologies, interoperability, and provenance

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Directory of Open Access Journals

Dissertations of the University of Groningen

Correction to: Recommendations for performance optimizations when using GATK3.8 and GATK4

Author: Derek E. Wildman
Eric D. Wieben
Eric W. Klee
Jacob R. Heldenbrand
JR Heldenbrand
Katherine I. Kendig
Liudmila S. Mainzer
Mathieu Wiepert
Matthew A. Bockol
Matthew E. Hudson
Michael T. Kalmbach
Nathan R. Mattson
Ravishankar K. Iyer
Saurabh Baheti
Steven N. Hart
Travis M. Drucker
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Managing genomic variant calling workflows with Swift/T.

Author: Azza E Ahmed
Daniel S Katz
Elliott Rodriguez
Faisal M Fadlelmola
Jacob Heldenbrand
Jennie Zermeno
Justin M Wozniak
Katherine Kendig
Liudmila S Mainzer
Matthew C Kendzior
Matthew R Weber
Tiffany Li
Yan Asmann
Yingxue Ren
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

Bioinformatics research is frequently performed using complex workflows with multiple steps, fans, merges, and conditionals. This complexity makes management of the workflow difficult on a computer cluster, especially when running in parallel on large batches of data: hundreds or thousands of samples at a time. Scientific workflow management systems could help with that. Many are now being proposed, but is there yet the "best" workflow management system for bioinformatics? Such a system would need to satisfy numerous, sometimes conflicting requirements: from ease of use, to seamless deployment at peta- and exa-scale, and portability to the cloud. We evaluated Swift/T as a candidate for such role by implementing a primary genomic variant calling workflow in the Swift/T language, focusing on workflow management, performance and scalability issues that arise from production-grade big data genomic analyses. In the process we introduced novel features into the language, which are now part of its open repository. Additionally, we formalized a set of design criteria for quality, robust, maintainable workflows that must function at-scale in a production setting, such as a large genomic sequencing facility or a major hospital system. The use of Swift/T conveys two key advantages. (1) It operates transparently in multiple cluster scheduling environments (PBS Torque, SLURM, Cray aprun environment, etc.), thus a single workflow is trivially portable across numerous clusters. (2) The leaf functions of Swift/T permit developers to easily swap executables in and out of the workflow, which makes it easy to maintain and to request resources optimal for each stage of the pipeline. While Swift/T's data-level parallelism eliminates the need to code parallel analysis of multiple samples, it does make debugging more difficult, as is common for implicitly parallel code. Nonetheless, the language gives users a powerful and portable way to scale up analyses in many computing architectures. The code for our implementation of a variant calling workflow using Swift/T can be found on GitHub at https://github.com/ncsa/Swift-T-Variant-Calling, with full documentation provided at http://swift-t-variant-calling.readthedocs.io/en/latest/

Directory of Open Access Journals

Recommendations for performance optimizations when using GATK3.8 and GATK4

Author: A McKenna
B Rabbani
C Raczy
CH Costa
D Decap
Derek E Wildman
Eric D Wieben
Eric W Klee
GA Van der Auwera
H Mushtaq
Jacob R Heldenbrand
JM Zook
Katherine I Kendig
L Deng
Liudmila S Mainzer
M Massie
M Plüss
MA DePristo
Mathieu Wiepert
Matthew A Bockol
Matthew E Hudson
Michael T Kalmbach
ML Metzker
MW Allard
N Kathiresan
NA Miller
Nathan R Mattson
Ravishankar K Iyer
S Goodwin
S-M Liu
Saurabh Baheti
Steven N Hart
Travis M Drucker
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref